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Recently, Subkoviak (1976) developed a s1ngle-adm1n1strat1on procedure 
for estimating the reliability of a criterion-referenced test composed of Items 
scored 0/1, The resulting reliability Index is termed the coefficient of agree- 
ment. The procedure represents an important methodological development for cri- 
terion-referenced testing because, in line with suggestions by Hambleton and 
Novick (1973)5 the coefficient estimates the proportion of mastery classifications 
that are consistent on two test administrations, while avoiding the necess^y of 
multiple test administrations* Application of the procedure requires an estimate 
of each examtnee's relative true score (1n the sequel simply true score). Thi 
true score is defined as the expected value of the proportion correct score*/ 
Subkoviak (1976) suggests using linear regression true score estimates ^ but 
raises a question about the adequacy of the estimates. 

Although It Is unlikely that the regression of true score on observed score 
1s precisely linear, the regression function should be monotonically non-decreas- 
ing. Therefore, a linear regression function should provide a good approximation 
to the regression function (Dawes and Corrigan, 1974), In particular* when the 
true score variance is small, a situation that 1s common in criterion-referenced 
testing (Hambleton and Novick, 1973; Popham and Husek, 1969), the approximation 
of the linear to the true regression function should be quite good. Thus /the use 
of linear regression estimates may be expected to produce reasonably accurate 
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estimates of the coefficient of agreement. ^ 

PURPOSE OF THE INVESTIGATION 
In light of the Introductory remarks s the purpose of the study was to investi- 
gate the accuracy of coifficients of agreement estimated on the basis of three dif- 
ferent true score estimates. The first two estimates were obtained for the 1^^ exam- 
inee using the linear regression equation 

[1] f. ^ 0 P. + yp (1 - 1,2,,/.,N), 

with i set equal to either the sample KR-20 or KR-21 coefficient. The symbol p^ Is 
the observed proportion correct score^ yp 1s the sample mean proportion correct score, 
T^* 1s the estimated true score and B 1s an estimate of the slope parameter. The 
third true score estimate ms simply p^^. These three estimates are referred to as 
the KR'ZOj KR-21 and proportion correct true score estimates* Once a true score 
estimate is obtained^ an estimate of the coefficient of agreement, P^, for a given 
cut-off score I c^ can be computed using the formula 

Pc=f^"'2 {[Prob(np^ > cfT^)]^ + [Prob(npi < clT^)]^}, 
"1^1 ^ 

with T^^ estimating T^- and n equal to the number of items. In order to use equation 
[2] an assumption about the conditional distribution of np^^ must be made, Subkoviak 
(1976) suggests the binominal or compound binominal distribution* 

Accuracy of estimation v/as studied in terms of Indices of bias and variability 
for coefficient of agreement estimators based on the three true score estimates,. The 
accuracy of estimation should ba dependent to somi extent on the homogeneity of the 
examinees, the number of Items, number of examinees and the cut-off score used to 
make mastery decisions. The effects of these factors were Investigated by a computer 
simulation of test performance, ~' 
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DESCRIPTION OF THE SIMULATION 
For each of six combinations of number of exarrilnees (N^lOsSO) and number of 
items (n-540i20)5 three matrices were constructed with elements p.^ (1-1 ^ . , . ^Ni 
i-l-,Z,. , , ^n) representing the true probability of success for the i*'^ examinee on 
the jth Item, These matrices were used in simulating the responses of three groups 
of N examinees to n items. The true scare variance^ with true score defined by 
T^ = n"" - 2 p.., differed for the three groups* Values of parameters describing 
the 18 simulated tests are reported In Table 1* The parameters - true score 

vari/inces error variance, mean true score and reliability, are defined as 

"j - 

Insert Table 1. About Here 



and 
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The computer simulation for each of the 18 tests was accomplished as follows: 

1. Generate a N x n matrix of item scares by conductina Nn independent 
Bernoulli trails. The 1jth score takes the value Twith probability 
p^.j and the value 0 with probability l-p^j. 

2. From the matrix of item scores compute the three true score esttmates 
for each of the t\ examinees. 

3. Using the three true score estimates in conjunction with the binomial 
error model* compute three coefficients of agreement for each of the 

n cut-off scores (1,2,3,,. .. sn) ,' These three coefficient of agreement 
GStimators and particular values for each estimator are referred to as 
the KR-20, KR-21 and proportion correct estimators and estimates respec- 
tively. 
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4, Repeat steps 1-3 for 100 Independent replications. 

5. Compute deviation statistics (see Tables 2^ 3 and 4) over 
100 replications for the estimated coefficients. 

"True'' coefficients of agreement for the n cut-off scores v/ere computed for each 

of the 18 matrices using the expansion of the compound binomial distribution given 

in Lord and Novick (1968. p/ 525)* 

CONSIDERATIOMS IN CONDUCTING THE SIMiiLATION 
Humber of Examinees and Items 

Tests lengths of 5, 10 and 20 Items were chosen because these values are 
typical test lengths discussed in the criterion-referenced testing literature 
(cf, Novick and Lewis, ia74^^ H^mbleton, Hutton and Swaminathan, In press). The 
numbers of examinees were 10 and 30, These numbers were thought to be. repre- 
sentative of typical class sizes and different enough to detect the effects of 
changing the number of examinees, :::::^.::v:^:^^ 

Homogeneity of P-jj's 

The average wi thin-examinee variance of the P^^j's was small for all matrices, 
indicating the items are homogeneous in difficulty for each examinee. These p.j's 
were chosen to simulate examinee response tendencies to criterion-referenced tests 
comprised of items that are homogeneous in content. (See f4illman (1974) for a 
discussion of v/hether criterion-referenced tests must be comprised of Items that 
are homogeneous in content,) 

Sampling of Examinees 

For each replication the true scores remain the same and therefore estima- 
tion of the coefficient for a population of examinees, on the basis of a random 
sample, is not. an issue. Rather* the issue is estimation of a coefficient for 
a population of administrations of the same test on the basis of data obtained 
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from a single administration of the test* When a test is used to make decisions 
on a specific group of examinees, interist should reside in the replicability of 
the decisions for that group. 

Sampling of Items 

It Is often asserted that criterion-referenced tests should be constructed 
by following prQcedures that permit the items comprising a test to be interpreted 
as a random sample from a well-defined domain of items (eft Hambleton^ SwaminathanV 
Algina and Coulson, 1974; Millman, 1974). It fonows that the coefficient of 
agreement expected for any two tests constructed by random sampling will be of 
interest* Hov/ever, regardless of whether random sampling is actually accomplished, 
in many instructional contexts only one exam is administered and decisions are 
based on this administration. Therefore^ the coefficient of agreement expected 
for any tv/o replications of the .it (or strictly parallel tests) is also of in- 
terest. This simulation focuses on the latter coefficient of agreement and for 
this reason sampling of items 1s not an issue* 

' RESULTS 
Statistics summarizing the results of the simulation are reported In Tables 
2, 3 and 4, Statistics are not reported for the runs with 10 examinees since 
the mean deviations for these runs are quite similar to the mean deviations for 
the runs with 30 examinees. The effects of number of examinees on the variabil- 
ity are discussed in a subsequent subsection. The results based on the KR-20 
and KR-21 estimates of true scores typically differ only in the third decimal 
place and so the latter results are not reported. The existing differences In 
the mean deviation generally favor the coefficient based on KR-20 true score 
estiftiates. The statistics for the cut-off scores not represented in Tables 2, 
3 and 4 indicate that the estimates are quite accurate for thase cut-off scores.^ 

; Insert Tables 2» 3 and 4 About Here 



-5* 



Several notable trends appearing in the data are summarized below. 
Effects of Cut-Off Score Changes, 

The bias of each coefficient of agreement estimator, as Indexed by the abso» 
lute mean deviation, tends to be largest for cut-off scores near nuj- ^or these 
cut-off scores the bias is positive for the proportion correct estimator and neg- 
ative for the KR-20 estimator. As the deviation between the cut-off score and 
njjj increases, the foil owing pattern tends to occur for both estimators: The 
absolute value of the bias decreases until the sign of the bias changes. The \ 
absolute value then increases and finally decreases again. Aspects of the pattern 
occur for all tests, but the pattern occurs niost clearly for the 20 item tests* 

. The variability of the estimator also. tends to be larger for. cut-off scores 
near nu-p than for cut-off scores at the extremes of the possible observed score 
distribution. For the cut-off scores near ny^ the variability of the KR-20 esti- 
mator tends to be larger than the variability of the proportion correct estimator. 
However, even the variability of the KR-20 estimator for the cut-off scores near 

njij is reasonably small. When N-30 the ftandard deviation reaches a maximum of 

about .08. , 

Effects of Reliability 

The effects of varying a^j and of varying number of itams will be summarized 
under the single rubric of effects of reliability. 

The bias for the proportion correct estimator tends to decrease with Increas- 
ing * while the bias for the KR-20 estimator tends to increase with increasing 
^ XT 

reliability. For almost all cut-off scores on tests with < ,35, the bias of 
the KR-20;estimator is smaller than that of the proportion correct estimator and 
is quite smalT in absolute size. For the test with pj^ = ,47 neither estimator Is 

----- aT 

uniformly less biased. However, on this test the only relatively large biases 
occur with the KR-20 estimator for cut-off scores equal to seven and eight. For 
the test with p|j ^ ,62 the proportion correct estimator is less biased for almost 



all cut-off scores and the absolute yalues of the biases are fairly small. In addition, 
with the exception of cut-off scores 14, 15 and 16, the bias of the KR^ZO estimator is 
also reasonably small. 

Effects of Number of Examinees 

The bias of the estimators is unaffected by changing the number of examinees. 
The variability of >oth estimators increases with the decrease in number of exami- 
nees. However, the effect is not very great. When N-10 the maximum observed 
standard deviation 1s approximately 40 for the KR-20 estimator. 



DISCUSSION 

Two of the results deserve further explanation. The first 1s the change In 
the sign of the bias as a function of the change in the cut-off score. Consider 
the idealized situation in which the true score estimates and the true scores 
Have equal means and are linearly dependant. Then for cut-off scores near nyj the 
coefficient of agreement, calculated using the binomial distribution, will be smaller 
for the less variable set of numbers. For cut-off scores at the extremes of the 
possible test scores the coefficient will be larger for the less variable set of 
numbers. The simulation indicates that 

[3] < < a^. 

T T P 

v/here the averages are taken over replications. In [3] 

^ M 

T 20 P 

where a^^ is the replication value for KR-ZO and is the estimated proportion 
correct score variance. Therefore, the KR-20 estimator will tend to underestimate 

the coefficient of agreemerrt* calculated using T.^ = n^* Sp^^ in conjunction with 

. . . .. - V J 

thB binomial distribution near ny^ and overestimate the coifficlent for the ex- 
treme cut-off scores* In the present study this coefficient is a very close 
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appfoximation to the true coefficients calculated using the compound binomial 
distribution* Tharefore, the KR-20 estimator tends to underestimate the true 
coefficient near n^^ and overestimate the coefficient for the extreme cut-off 
scores. Moreover^ since > a^j the opposite relationship holds for the propor- 
tion correct estimator. 

The second result requiring explanation is the relationship between the bias 
of the two estimators and . An explanation relevant to cut-off scores near nyy 
is offered below. A similar explanation can be extended to other cut-off scores, 
but in view of space limitations the extension is left to the reader. 

For the KR-20 estimator the keys to the explanation are that (1) the smallest 
possible estimated coefficient of agreement is .50, a value that can occur only 
when the estimated KR-20 = 0.00, and (2) the KR,20 estimator tends to underestimate 
the true coefficient for cut-off scores near ny^. As p^^ approaches zero the true 
coefficient approaches .50 for cut-off scores near npj* and therefore the underesti- 
mation cannot possibly be great. On the other hand, when p^^ Is large the true co- 
efficient can be substantially larger than .50, and the underestimation can be sub- 
stantial. In Table 2 the mean deviations for cut-off scores 14 and 15 on examina- 
tions one and three Illustrate these relationships. (The reader should note that 
the reported statistics or parameters for a particular cut-off score on examina- 
tions one and two or two and three are not comparable, since for exam two 
differs from for the other two examinations.) For the proportion correct esti- 
mator the keys are that (1) this estimator tends to overestimate the true coefficient 
of agreement, and (2) the true score distribution Is estlmatod by the obsiirved score 
distribution. The degree of overestimation will depend in part on the proportion 
of the estimated true score variance, here the observed proportion correct score 
variance, that is error variance. When p^^ is low, this proportion 1s high and 
overestimation tends to be great. On the other hand, when p*.^ Is large the degree 
of error score variance is smaller and therefore the overestimation is smaller. 
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The relationship. between p^.. and the two estimators suggests that when 

XT 'XT 
is large^ say greater than .50, the proportion correct estimator might be used. 

However, it should be noted that„^--20 1s quite variable over replications and 
may be a poor guide to the choice of estimator. A better strategy may be to 
average thi proportion correct and KR-20 coefficients of agreement when KR-20 
Is large. 

: CONCLUSION 
The results Indicate that with few exceptions accurate estimation of the 
coefficient of agreement can be obtained using the KR-20 estlniate of true score 
In conjunction with the binomial error model, at least for tests comprised of 
Items that are homogeneous 1n difficulty for each examinee. The coefficients 
estimated on this basis were substantially biased only for cut-off scores near 
nyj for tests with > ,47. Moreover, th|^ar1ab11 ity of the estimator was 
reasonably small 1n all cases • -^^^ ^ 



Footnotes 




Indicated that there was very little difference in the accuracy of estimated 
coefficients based on the two models, and therefore the cost of duplicating 

- . - 



error of measurement is binomial, then the regression parameter should be 
KR-21. However, when it is desired to' estimate the proportion of mastery 
classifications that will be consistent for repeated administrations of the 
same or strictly parallel tests, KR-ZO provides the better lower bound esti- 
mate of the reliability of the test (Lord and Novick, 1968) and probably 
shquld be used even if the, binomial distribution is employed for the sake 
of computati onal convenience , 

A copy of tables reporting the entire set of results is available from the^^ 
authors. ' 
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..- Table l ;" 
Pararneters Describing 18 Simulated Tests 









Parameters 




Matrix Dv-.ans1ons 


Examination 










(exarmnees x items) 




1 


t 




XT 


(30x20) 


1 


.0018 


. 0096 


.71 


.16 




2 


.0039 


.0084 


.76 


.32 


■■- : .... .. :. - . 

(30x10) 


3 


.0148 ^ 


.0092 . 


.70 


.62 


4 


.0029 


.0192 


.70 


.13 




■5 


.0055 


.0169 


.75 


.24 


(30x5) 


' 6 


.0165 


.0185 


.69' 


.47 


7 


.0047 


.0389 


.70 


.11 




8 


.0051 


.0336 


.76 


.13 




9 


.0I7O 


.0367 


.70 


. .32 


(10x20) 


10 


.0017 


.0096 


.71 






11 


.0036 


.0082 


.76 


.31 


(10x10) 


12 


.0158 


.0092 


.70 


.63 


13 


.0022 


.0195 


.70 


.10 




14 


.0061 


.0165 


.76 


.27 


(10x5) 


15 - - 


.0166 


.0183 


.69 


.47 


16 


.0035 


.0381 


.71 


- .08 




17 


.0070 


.0314 


.78 


.18 




-18 


.0144 


.0375 


.69 


.28 



erJc 
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.Indices of Bias and l/ariabllity for Two Coeffiei'int of 

Agreaoient Estimitors: n=20, MO 



yAQIinilQ t i uM 


rarafiiiigr 

ollU 


Cut-Off ScoreB ' . ■ .■• / 






10 


11 


12 


13 


14 


15 


16 


17 


la 






irui 


























. Coifficient 


OfiQ 






.oio 




*boo 


. .bbb 


,635 


. ./qO: 


AAA 

.909 










-.059 


%060 


-.027 


.041 


■ 112 


.134 


.084 


-.004 


-.058 


-.056 


p^ ^^ .16 


Deviation 


' nni ■ 






Mi 


ni E 

=.015 


n^i 

-.031 


-.032 


• -.015 


.001 


.005 


.002 


it : 






























. m 


.021 


.024 


.025 


.025 


.025 


.026 


027 


026 


022 


018 




UcY I Q u 1 u| 1 


-'MS 


.014 


.028 


.044 


.048 


.031 


.023 


.044 


.042 


.024 


.008 






























Coefficient 






.330 


on*) 
Mi 


Ml 




,632 


.612 


.674 


,794 


.911 




rlCQn 




-,03d 


-.044 


-.Oja 


-.008 


.040 


.082 


.090 


.052 


-.008 


-.045 


P tOC 
VT 


ueviation 


Ann 
Mi 


.UUo 


■ UIJ 




.014 


-.014 


-•,m ■ 


-.072 


Ail 

-.042 


.004 


AAA 

.002 


Al 




.012 


.015 


.020 , 


.024 


.027 


.028 


■ .027 


.025 


■ .024 


.026 


.025 




MeV 1 ^ y 1 UII 


.014 


.008 


.016 


_^._030 


.047 


.055 


.043 


- .031 


.048 


.046 


,025 - 




Tryi ' 
























3 


Coifflelent 


.§20 


.873 


.827 


' .791 


.766 


.751 


.745 


.753 


.781 


.839 


.918 




Mean 


-.015 


.002 


.017 


.025 


.027 


.026 


.025 


,023 


.013 


-,002 


-.025 


p^ = .62 


Deviation: 


.027 


.035 


.026 


-.004 


".044 


-.076 


-.085 


-.060 


.016 


.025 


.027 


XT 




























Standard 


-.020 


.021 


.022 


.022 


.025 


.030 


.032 


.029 


.023 


.023 


.023 




Deviation 


.025 


.030 


.030 


.030 


.038 


;047 


.047 


.036 


.0J1_ _ 


.033 _ 


_ ,023_ _ 



Note: For the rows corresponding to each statistics the first line is for- the proportion correct estimator and the 
second Is for the KR-20 estimator. 1 



Table 3 



Examination 



P 



XT 



V = ,24 
XT ' 



.2 - 



XT 



Indices of Bias and Variability for Two Coefficient of 
Agraement Estimators: n^lO, HO . 



Pirameter 

and 
Statistics 



True 

Coefficient 

Mean , 
Deviation 

Standard 
Daviation 



True. 

Coefficient 

Mean - - 
Deviation 

Standard 
Deviation 



True ■ 
Coefficient 

Hean 

Deviation 

Standard 
Deviation 



Scores 



M M M M ; M .740 .936 

-.055 : -.056 .013 - . 118 .131 .012 ^-.059 

-.001 -.002 =.009 -.022 : -.022 >.Q07 .000 

.020 ,028 .029 .027 .025 . .027 025 

■,012 .033 .057 ; .045 , .029 .054 .024 



■.986 .945 .844 : .694: : .590 . :.657 ; .866 

-.037 ^.050 ^.021 .055 .120 .074 : -.019 

.004 .011 .012 -.017 ,-.052 -.032 :. 010 

.031 .044 .050 .049 .058 .053 .053 

.019 .042 .070: .083' .066". .074^ ,060 



.924 


.838; 


.751 


■ .700 


.690 


.745 


.891 


-.028 
.024 


.001 
.028 


.031 

-.027: 


,04.4 


.049 
-;086: 


;:.038 
.007 


-^.007 
J34 


■.^021 
.026 


: .024 ' 

.039 


.030 
.050 


.031- 
.057 


.032^ 
.050 


.032' 
.046 


.027 ; 
.031 ■ 



Note: For the rows corresponding to each statistic, the .first line is for the proportion correct 
■ istimator and the second is for the M estimator. 



fable 4 



Indices of Bias and VaHability for Two Coefficient of 
. Agreenient E :t1matQrs : n=5,:N=30 





Parameter 
and 


Cut-Off Scores 


Examl nation 


Statistics 


2 


3 


4 


5 




True 

Coefficient 


.936 


.737 


: .542 


.724 
















: Hean 


-.056 


.040 


.165 


.073 


p-: ;:^ ,11 


Diviation ■ 


-.002 


-.005 : 


-.016 


-.010 ; 




Standard 


,024 


.031 


.033 


.033 




Deviation 


.025 


.056 


.032 


Ml : 




True .. 












Coefficient , 


.969 : 


'828 




.643 


v:v 8 














Mean 


-.056 


-.014 


.144 


.156 


p^; ' -:. .13; 


Deviation 


-.004 


-004 


-.013 


-.015 


;\XT 












:>'.,-..;: ■ . ■ . .. ■ 


Standard 


.024 


.032 


.036 


.035 


----- 


Deviation 


.019 


.054 


.055 


,052 




: True ■ 












Coefficient 


.90S 


.752 


m 


,715 


|\-;g;. ..... 














Mean. 


-.035 


.029 


.109 


.100 




. Deviation 


.017 


^.022 


>.075 


.001 
















Standard 


.023 


.027 


.032 


.036 




Deviation 


.029 : ' 


,051 


.040 : :i 


,046 



Note:. For the rows corresponding to each statistic, the first Tine Is for 

: the proportion correct estimator and. the second 1s for the ^^^^K^^ 
jy.': ' estimator.; ^. 
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